Skip to content

Bug 2023624 - Some older repos have comments that have empty comment body and user object is null. The ETL script needs to handle these better instead of crashing#12

Merged
dklawren merged 2 commits intomainfrom
2023624
Mar 18, 2026
Merged

Bug 2023624 - Some older repos have comments that have empty comment body and user object is null. The ETL script needs to handle these better instead of crashing#12
dklawren merged 2 commits intomainfrom
2023624

Conversation

@dklawren
Copy link
Contributor

  • extract_reviewers: filters out any review where user is null, preserving empty-body reviews (e.g. approve without comment)
  • extract_comments: filters out any comment where user is null or body is empty
  • transform_data: defensive (review.get("user") or {}) so a null user won't raise AttributeError even if it somehow reaches the transform

…body and user object is null. The ETL script needs to handle these better instead of crashing
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds defensive handling for GitHub PR reviews/comments where user is null or body is empty, preventing ETL script crashes on older repositories with malformed data.

Changes:

  • extract_reviewers: Filters out reviews with null user and logs skipped count
  • extract_comments: Filters out comments with null user or empty body and logs skipped count
  • transform_data: Uses (review.get("user") or {}) pattern to safely handle null user objects

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

@dklawren dklawren requested a review from shtrom March 17, 2026 14:06
Copy link
Member

@shtrom shtrom left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think the error we saw will be addressed by this fix.

Is there a way you could run one iteration of the loop against a given PR locally? That would make it easier to reproduce topical issues.

Copy link
Member

@shtrom shtrom left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I didn't consider all options.

I thought review== so that review.get raises that error
however I now think you may be right about review['user']==None, in which case your fix is correct.

In any case, I think that call for some local one-shot runability to confirm (:

@dklawren dklawren merged commit 161dc95 into main Mar 18, 2026
4 checks passed
@dklawren dklawren deleted the 2023624 branch March 18, 2026 14:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants